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The work reported here was performed between 1 June 1967 and 31 May 1968 
under Task I of the contract between Chemical Abstracts Service (CAS) and the 
National Science Foundation (NSF). The objective of th ; s contract was to achieve an 
operational, integrated, man-machine system for manipulating information about 
chemical substances. The contract was directed at the development of a 
computer-based Chemical Compound Registry System through the buildup of machine 
and manual files of chemical structural representations, chemical nomenclature, 
molecular formulas, and bibliographic citations associated with the registere d 
compounds. To meet this objective the contract called for the registration of 
compounds from several sources. Progress was made in the number of compounds 
registered from all sources. CAS worked to extend the range of compounds that 
could be machine-registered and to improve the overall efficiency and effectiveness 
of the system by shifting additional tasks to the computer to improve the technical 
and/or economic performance of the system. <RR) 
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I. SUMMARY 



The work reported here was performed between 1 June 1967 and 31 May 
1968 under Task I of Contract NSF-CUlU between CAS and the National Science 
Foundation (NSF). The overall objective of this contract is to achieve an 
operational, integrated, man-machine system for manipulating information 
about chemical substances. This system must be capable of high speed, 
flexibility, and depth in responding to the information needs of those who 

use chemical information. 

Task I of Contract NSF-CUlU' is directed at the development of a 
computer-based Chemical Compound Registry System through the buildup of 
machine and manual files of chemical structural representations, chemical 
nomenclature (including the systematic nomenclature developed for and used 
in Chemical Abstracts (CA) indexes, as well as trivial and other types of 
systematic names), molecular formulas, and bibliographic citations associ- 
ated with the registered compounds. 

To meet this objective the contract calls for the registration of com- 
pounds from several sources, including the current literature as indexed in 
CA , and the extensive CAS internal reference files. Progress has been made 
in the number of compounds registered this year from all sources. In addi- 
tion, CAS has worked to extend the range of compounds that can be machine- 
registered and to improve the overall efficiency and effectiveness of the 
system by shifting additional tasks to the computer to improve the techni- 
cal and/or economic performance of the system. 
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II. REGISTRATION OF CHEMICAL SUBSTANCES 



To date, 887,104 unique substances have been filed in the system. 

These were obtained from CA indexes and CA reference files , as well as 
from other sources not included in Task I. The total number of registra- 
tions carried out is 1,697,257, representing the 887,104 unique substances 
on file plus 810,153 registrations of substances that matched one already on 
file. 

A. Registration of Compounds Indexed in ChemiogZ Abstracts 

During the period 29 May 1967 through 31 May 1968, CAS added 248,051 
unique compounds to the Registry Files as a result of registrations from 
the current literature as indexed in CA Subject Indexes. This brings to 
693,749 the total number of compounds registered from CA index processing 
since Contract C4l4 began. Registration for Volumes 62-66 of CA has been 
completed at this time. Volume 67 has so far contributed 142,640 registry 
transactions which is an estimated 10% of the total to be included. Some 
7555 of the structures have been prepared, and 10% of the names have been 
keyboarded. Table III summarizes the machine and manual registrations re- 
sulting from registration of compounds from CA indexes • 
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B. CAS Reference Files 

The registration of compounds from fourteen CAS reference files con- 
tributed 8194 unique structures to Registry Files during the year 29 May 
1967 through 31 May 1968. This brings to a total 113,430 the number of 
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compounds registered from CAS reference files under Contract cUlU. 

Table III identifies the files and the number of compounds registered 
from each. Generally the contribution of any one file is relatively small, 
since only the updates to the file must now be registered, since registra- 
tion of the basic files was completed during earlier work under the con- 
tract . 

C. Sources Not Included in Task I 

In addition to the CA indexes and the CA reference files, CAS has 
registered 18,923 unique compounds from sources not specified as part of 
Task I. These include other government-funded work such as Task III of 
Contract CUlU. 
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III. REGISTRY IMPROVEMENTS 



A. Registry Redesign and Reprogramming 

During this year, the redesign and reprogramming of basic programs of 
the Structure Registry was completed. The 360 Structure Registry programs 
were installed on 8 April 1968. All programs are working satisfactorily 
and only minor program problems were encountered at installation time. 

These were corrected rapidly and caused no processing delays. 

Alterations have been made to the bibliography system so that all no- 
menclature can be retained on the Bibliography File for more economical re- 
trieval via the Registry Number. Modifications were also made which allow 
the system to handle proposed structures through the retention of several 
molecular formulas. 

B. Improved Textual Descriptors 

CAS has developed improved conventions for deriving textual descrip- 
tors for certain specific types of alkaloids. The basis for the procedures 
is the structure and/or systematic name. The textual descriptors consist 
of the name of the parent structure with alphanumeric prefixes for posi- 
tions whose stereochemical detail is not implied by the name. For example, 
in the alkaloid whose CA index name is Crinan-3A-ol,lB,2B-epoxy-7-methoxy , 
Crinan is the parent compound and nodes 1,2,3, have stereochemistry that 
must be specified. The special descriptor for this alkaloid is: 

IB , 2B , 3A-CRINAN . 
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Additional capabilities for metallic compounds, text descriptors, and 
abnormal masses have been added to the connection table edit program. 
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TABLE I 



SUMMARY OF REGISTRATION 
1 June 1965 through 31 May 1968 



Source 



A - Current Literature 



CA Volume 
CA Volume 
CA Volume 
CA Volume 
CA Volume 
CA Volume 



62 

63 

64 

65 

66 
67 



Subtotal 



B - CAS Reference Files 



Alkaloid File 
Colour Index 
Drug File 
Fluorine 

CA Formula Index Cross-References 
Lange Handbook of Chemistry 
Merck Index 



Pesticide Index 



CAS Reference File 

Ring Inde x (plus supplements) 

SOCMA Handbook 



CA Specific Volume Cross-References 
"^Steroid File 



CAS Subject Index Cross-References 
^Terpene File 

USAN (United States Adopted Names) 



Subtotal 



C - Task III 



D - NON-C4l4 



E - Manual Registration (includes 6,097 mixtures) 



F - Total 



^ Not routinely updated 




Unique Substances Registered 



Il4,8l6 



131,1 97 

119,669 " 



106,265" 

967336 



92,650 



697 



3,270 



33b 



‘42,157' 



953 
4,183 



Sp05 



156“ 



17 



21,901 



1,723 



349 
15lB09 



518 

~82 






663,133 



104,631 



2,146 



68,157 ' 



49,037 



887,104 



1 



1 
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TABLE XI 




TOTAL NUMBER OF REGISTRATIONS BY SOURCE , jl 

l| 
» £ 

j | 

Registrations Through i 

31 May 1968 



A - Handled By Machine and Manually 

1 - Current Literature 

CA Volume 62 ~" 

CA Volume 63 
CA Volume 64 
CA Volume 65 
CA Volume 66 
CA Volume 67 

2 - CAS Reference Files 

Alkaloid File 
Colour Index 
Drug File 
Fluorine 

CA Formula Index Cross-References 

Lange Handbook of Chemistry 

Merck Index 

Pesticide Index 

CAS Reference File 

Ring Index (plus supplements) 

SOCMA Handbook 

CA Specific Volume Cross-References 
^Steroid File 

CAS Subject Index Cross-References 
^Terpene File 

USAH (United States Adopted Names) 

3 - Task III 

4 - N0N-C4l4 



Total Structure Registrations 
B - Handled By Name Match 
CA Volumes 
CAS Reference File 
Task III 
NON-C4l4 



166,851 



189,845 



173,317 



1764W 

159 , 1380 “ 



l42,64oa 



2,987 



9,059 

855“ 



59,634 



2,809 



87939 



16,538 



482 

W 



287393 

6,753 



1,070 

"227524 



^28" 



20,620 



113,567° 



156,237 



25,892 



18,732 



185,991 



l,310,405 d 



I 



Total Registrations by Registry Number 386,852 

Total .1,697,257** 

a Processing incomplete as of 31 May 1968 
b Not routinely updated 
c Not charged to NSF 

** Includes all compounds processed before 29 May 1965 




ADDITIONS TO THE PILE OF UNIQUE SUBSTANCES BY SOURCE 



Source 



A - 



Machine Registration 
1 



Current Literature 

62 



CA Volume 
CA Volume 63 
CA Volume 64 
CA Volume 6 5 
CA Volume 66 
CA Volume 67 



2 - CAS Reference Files 



Alkaloid File 
Colour Index 
Drug File 
Fluorine 

CA Formula Index Cross-References 
Lange Handbook of Chemistry 
Merck Index 



Pesticide Index 



CAS Reference File 

Ring Index (plus supplements) 

SOCMA Handbook 



CA Specific Volume Cross-References 
^Steroid File 

CAS Subject Index Cross-References 
+Terpene File 

USAN (United States Adopted Names) 



3 - Task III 



h - NON-CUlU 



B - 



Subtotal 

Manual Registration 



CA Volume 62 
CA Volume 63 
CA Volume 6k 
CA Volume 6 5 
CA Volume 66 
CA Volume 67 



CAS Reference Files 
Task III 



NON-C4l4 

Subtotal 
C - Grand Total 
+ Not routinely updated 



o 

ERIC 



Through 
29 May 1967 



109 r 021 
125,498 


113 , 374 


87,787" 


6 , 309 


- 


693 


3,159 


303 


42*147 


SS2 


3,355 


5,779 


l49“~ 


- 


20,034 


~ 6,4i4 


1^25 


349" ' 


14,700 


5 l£T" 


37 


1,672 


52,572 


596, 367 


804 


471 


1,176 “ 


1,258 


- 


- 


5,102 


5,735 


623 


15,169 


611,536 



29 May 1967 



to 

31 May 1968 


Total As Of 
31 May 1968 


_ 5 ,79 *5 




H4 r 8l6 


5 3 6qq. 




131 *107 


6; 495 




119,869 


18,478 




106,P65_ 


92,027 




98,336 


92,650 1 




92,650 


4 




697 


121 




_ 3.270 _ 


__ . 33 




336 


10 




42,157 _ 


71 J 




953 , 


828 




4,183 


226 




_ _ 6,005 _ 


7 




_ . .156 _ 


..... 17 . „ 




17 


1,867 




21,901 


61 




6,475 . 


98 




1,723 






„349__ 


1,109 




15,809 . 


- 




518 


45 




82 


474 




2,146 


15, 585 




68,157 


241,700 




838.067 


5,828 




6,632 


2,738 . 




3.209 


1,92 8 _ 




3J.04 


7.388 




8,646 


6.312 




6.312 


2,713 




2,713 


3.697 




.. 8,799 


2,951 




8,686 


313 




936 


33,868 




49.037 


275, 568 




887,104 
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TABLE IV 

FIRST SOURCE OF NAMES ON NOMENCLATURE FILE 



Names on File 

Source 31 May 1968 

A - Current Literature 

CA Volume 62 169? 859 __ 

CA Volume 63 178, 307 

CA Volume 64 218,198' 

CA Volume 6 5 2.11,07 15 - 

CA Volume 66 182,308 

CA Volume 67 34,093 a - 



B - CAS Reference Files 

Alkaloid File 3,14-2 

Colour Index 38 , 363 

Drug Index 2, loo 

Fluorine 51,491 

CA Formula Index Cross-References . 3 >401 _ 

Lange Handbook of Chemistry 11,79-1 

Merck Index 25,9ol . 

Pesticide Index . _ 3, 373 

CAS Reference File 2 > 896 , 

Ring Index (plus supplements) 19,844 

SOCMA Handbook _ , 27,292 

CA Specific Volume Cross-References _ 13,^-247 

^Steroid File 2,400 

CAS Subject Index Cross-References 60,1 q,3_ 

^Terpene File 3,l8l 

USAN (United States Adopted Names) . 2,421 

Task III (NLM-FDA) 56,989 , 

NON-C414 74,319° 



Q- Processing incomplete as of 31 May 1968 
^ Not routinely updated 
c Not charged to NSF 
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Percentage of Registry Transactions 
that Result 

In Substances New to the Registry File 







FIGURE I 
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